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Abstract 

o 

. We present a multi-channel P2P Video-on-Demand (VoD) system using "plug-and-play" helpers. Helpers are het- 

erogenous "micro-servers" with limited storage, bandwidth and number of users they can serve simultaneously. Our 
proposed system has the following salient features: (1) it minimizes the server load; (2) it is distributed, and requires 
little or no maintenance overhead and which can easily adapt to system dynamics; and (3) it is adaptable to varying 
supply and demand patterns across multiple video channels irrespective of video popularity. Our proposed solution 
jointly optimizes over helper-user topology, video storage allocation and bandwidth allocation. The combinatorial 
nature of the problem and the system demand for distributed algorithms makes the problem uniquely challenging. By 
utilizing Lagrangian decomposition and Markov chain approximation based arguments, we address this challenge by 
designing two distributed algorithms running in tandem: a primal-dual storage and bandwidth allocation algorithm 
and a "soft-worst-neighbor-choking" topology-building algorithm. Our scheme provably converges to a near-optimal 
solution, and is easy to implement in practice. Simulation results validate that the proposed scheme achieves minimum 
sever load under highly heterogeneous combinations of supply and demand patterns, and is robust to system dynamics 
of user/helper churn, user/helper asynchrony, and random delays in the network. 



> 1 I. Introduction 

\ Our paper is motivated by the following characteristics of online video traffic: 

• The amount of video traffic is growing exponentially 0]], ifTOl : YouTube estimates that 24 hours of video are 
uploaded to its site every minute; thousands of films and TV shows are available for streamed viewing from 



X 



sites such as Netflix, Amazon and iTunes. Cisco projects that video will account for 60% of the Internet traffic 
by 2013. 

• The demand for video titles is time-varying and heavy-tailed ||6). The need for on-demand video delivery, 
where users can "channel surf" by switching across the menu of thousands of available videos, is rapidly 
upsurging. 

$-i ' A fundamental challenge in supporting such a large and diverse on-demand infrastructure is the degree to which 
. this infrastructure can be distributed and maintained at low cost. It seems clear that architectures biased towards 
centralized distribution are not likely to be scalable as video adoption grows. YouTube, for example, pays millions of 
dollars per month only on bandwidth costs IfTOl . Peer-to-Peer (P2P) systems save cost and adjust load automatically, 
yet they do not provide acceptable quality of service (QoS) for the viewers of all but the most popular videos. 

To match the heavy-tailed demand patterns at low maintenance cost, researchers and engineers have introduced 
the concept of helpers and explored the design of helper-assisted P2P VoD systems. Helper nodes are "micro- 
servers," which, in the system scale, have only limited individual resources of storage and bandwidth for the video 
streaming service. In the PPStream system for example (TJ, each peer dedicates about 1 GB of its local storage 
to cache previously watched videos and helps serve users, therefore reducing the load on the central server. The 
concept of helpers has also been used in other P2P streaming applications including Xunlei [3] and PPLive 0. In 
this paper, we envision an ecosystem in which a variety of devices, including set top boxes, inexpensive PCs and 
small servers such as CDNs, are incentivized to participate in an economy of helpers. 

Minimizing the server load in a helper-assisted P2P VoD system is a challenging problem under practical 
constraints. First, the extreme range of videos and the sheer amount of content makes it impractical to store 



and serve every video on any individual helper. Given a number of distributed helper nodes with limited resources 
of storage and bandwidth in the system scale, it is important to answer the questions of "what fraction of what 
videos should be stored on each helper to optimize overall system demand patterns" and "how much bandwidth 
should helpers allocate to each of their requesting users?" Second, due to practical connection overheads, there 
are limits on how many users each helper can simultaneously connect to and vice versa. Therefore, an important 
question is "how should we build optimal helper-user overlay topology?" Since the number of helper-user topology 
configurations is exponential in the number of nodes, topology building is a challenging combinatorial problem. 
Third, video popularity is time-varying, and helper/user may randomly join and leave the system, making it difficult 
to keep track of the supply and demand patterns in real time. It is desirable that the system can adapt to these 
fluctuations with minimum overhead. 

In this paper, we present a helper-assisted P2P VoD system to solve these challenges. Our system has the following 
distinguishing attributes: 

• It minimizes the server load. To do this, we jointly optimize over helper's video storage and bandwidth 
allocations, and helper-user topology. By utilizing Lagrangian decomposition and Markov chain approximation 
based arguments, we design two distributed algorithms running them in tandem: a primal-dual resource 
allocation algorithm and a "soft-worst-neighbor-choking" topology-building algorithm. 

• The proposed algorithm is fully distributed and easy to implement. Peers can dynamically optimize system 
resources by taking actions based on only local information yet being able to achieve global optimality with 
provable convergence. Helpers are "plug-and-play", i.e., a newly deployed helper will automatically connect 
to a set of interested users and load balance its storage and bandwidth according to the up-to-date supply and 
demand patterns with minimum maintenance requirement. 

• Thanks to the simple distributed solution, our proposed system is adaptable to varying supply and demand 
patterns across multiple video channels irrespective of video popularity. Our system achieves minimum sever 
load under highly heterogeneous combinations of supply and demand patterns, and is robust to system dynamics 
of user/helper churn, user/helper asynchrony, and random delays in the network. 

Simulation results validate the feasibility and effectiveness of the proposed algorithms and offer new insights into 
building practical P2P VoD systems. 

II. Related Work 

Distributed bandwidth allocation for P2P streaming was studied in several works |[T3l , |[T5l . Wu and Li proposed a 
rate allocation scheme for a single-video P2P live streaming application without helpers and showed its convergence 
and optimality (131 . Wang et al. proposed a distributed solution to minimize the weighted sum of server load and 
non-ISP-friendly traffic |fT3l under a single video scenario. The problem of being able to switch streams (channels) 
has been studied in the live P2P streaming case by Wu et al. 1161 who propose the concept of view-coupling lfl4l . 
fTSl . When there is storage constraint, Huang and et al. ifTTl suggested using the proportional replication strategy, 
i.e., replicating video storage in the system proportionally to their demand. However, this ignores the available 
system bandwidth for the videos, and can result in poor performance for videos with low demand as was observed 
by Wang and Lin ifLTft . Optimal multi-channel on-demand solutions with both bandwidth and storage constraints 
are challenging primarily because it is difficult to effectively aggregate instantaneous user demand and keep track 
of available system resources distributively. 

The concept of helpers has been addressed by several authors Ifl4l . IfTTl , |[T8l . Wang et al. studied a single- video 
helper-assisted P2P live streaming scenario and proposed that each helper downloads only one coded packet of 
the segment that is currently being streamed. Simulation results showed that the proposed system can achieve 
significantly improved streaming bitrate without incurring additional server load. Zhang et al. IfTTl and He et 
al. |[T8l individually proposed similar concepts of using helpers in a single-video P2P VoD application to boost 
system performance. He et al. also proposed a distributed algorithm to allocate bandwidth among helpers and users 
assuming a given helper assignment and a fixed helper-user connection topology. However, all of these works focus 
on a single video scenario without considering helpers' storage constraints imposed by the sheer amount of aggregate 
video volume when tackling the common scenario having an arbitrarily large number of videos. Furthermore, fixed 
overlay topology was assumed and was not optimized for the overall system performance. BitTorrent ETl uses 



a "worst-neighbor-choking" algorithm to update overlay topology, in which users periodically connects to a new 
and randomly chosen neighbor and chokes the worst performing neighbor. In contrast, we propose a "soft-worst- 
neighbor-choking" algorithm and prove its optimality. 

There are a number of works on practical P2P VoD system design. Annapureddy and et al. proposed a P2P 
VoD system called Redcarpet [19]. The authors proposed an efficient video block dissemination algorithm in a 
mesh-based P2P system, and showed that pre-fetching and network coding techniques can greatly improve system 
performance. Simulation results showed their system can achieve small start-up time and smooth video playback. 
Huang et al. studied ifTTl the challenges and the architectural design issues of a large-scale P2P-VoD system based 
on the experiences of a real system deployed by PPLive. Such challenges include coordinating content storage 
distribution, content discovery, and peer scheduling. There are also a number measurement studies of practical P2P 
VoD systems |9"1. ifTOl 

Our work contributes to the VoD literature in several respects: First, we take practical considerations of bounded 
user/helper connections and propose a distributed algorithm that optimizes the overlay topology building. Second, 
we target the problem of multi-channel helper-assisted P2P VoD system with both storage and bandwidth constraints 
and jointly optimize their allocation. Third, our distributed solution is provided with provable analytics that allows 
for videos to be served efficiently and with effective response time irrespective of video popularity. Our system 
is "plug-and-play", and can easily handle user/helper dynamics and video demand pattern changes and it does so 
with minimal central co-ordination. 

III. Problem Setup 

We first formulate the problem of minimizing the server load in a static helper-assisted VoD system, and then 
design distributed solutions that allows the system to be adaptive to varying demand patterns in dynamic situations 
with minimum maintenance overhead. Table U lists the relevant notations. 

A. Problem Overview and Assumptions 

Consider a VoD system where M videos are served by a dedicated central server and a group of helpers. The 
dedicated central server fills in any required system deficit to guarantee the real-time streaming requirements of all 
the users. Each video m has constant streaming rate r m , duration l rn and size V m = r rn l m , m = 1, 2, . . . , M. There 
are I m users in each video session m, and every user i m watches only one video at a time. Consider J helpers in 
the system, and each helper j has storage capacity Sj and upload bandwidth capacity Bj, j = 1, 2, . . . , J. 

We assume that the download bandwidths do not represent a system bottleneck, as is true in typical Internet 
access scenarios today. We also assume decoupled roles of users and helpers, i.e., users can only request service 
and helpers can only provide service. In the case that users can also help redistribute the video content, one can still 
conceptually separate their roles as service -request and service-offering entities, as proposed by Zhang et al. |[T8l 
and Wu et al. ifToll . We will refer to both users and helpers as "peers". 

Due to practical limits in connection overhead, we consider that each user i m (helper j) can simultaneously 
connect to at most iV™ ax (iV™^) neighbor nodes from its candidate neighbor set Nj TO (Nj). Denote by set C the 
entire possible helper-user topology configurations, where each configuration c G C is a set of links which connect 
all the users and helpers and which satisfy the bounded-neighbors constraints. Denote by ( m ) the set of 
active helpers (users) that user i rn (helper j) connects to under configuration c. Helper j's upload rate to user 
i m € m under c is denoted by x c ^ . For simplicity, we will drop the superscript c in x c ^ and use Xji m instead, 
but it should be easy to clarify based on the context. 

We break video m into segments, each having k packets of equal size, e.g., 1KB. A helper increases (decreases) 
its stored portion of video m, by downloading (offloading) at the unit of one packet per segment for all video 
m's segments. Each packet that the helper stores for each segment is coded using a random linear combination 
of the k original packets of that segment. In this way, any coded packets from the helpers are equally useful to 
users in need of the corresponding segments. Consequently, a helper with fj m fraction of the video m can supply 
at the rate of / Jm r m to each user in video session m, regardless of users' playback times and what they receive 
from other helpers. The above coding and storage arrangement simplifies the system design as well as the problem 
formulation. 



Table I 
Key Notations 



Notation 


Definition 


M 


total number of videos 


Im 


total number of users watching video m 


J 


total number of helpers 


I'm , lm ) Vm 


video m's streaming rate, duration and size 


C 


set of feasible overlay configurations 




user i m 's helper neighborhood 




helper j's user neighborhood 


N L 


set of helpers connected to user i m under c 


N c 

3,m 


set of users connected to helper j under c 


B V S 3 


upload, storage capacity of helper j 


X ji,n 


upload rate from helper j to user i m 


fjm 


fraction of video m stored by helper j, in [0, 1] 


k ■ 


helper j's availability price to user i m 




bandwidth, storage prices of helper j 



Note: we use bold-type to denote vectors. 



max 
c,f ,x 



B. Problem Formulation 

Minimizing the server load is equivalent to maximizing the sum of helpers' effective contribution to all the users: 
max c f x Ylm=i Si'"=i mm (X]jeN5 x jim > r m)- Here we implicitly assume that helper's download is a transient cost 
and is negligible compared to their contribution. Incorporating the constraints, we arrive at the following optimization 
problem: 

M I m 

^2 ^2 min ( ^2 X 3^ r m) (1) 
m=li m =l j^i m 
S.t. x ji m — fjm^mi Vj, m, i m G (2) 

M 

E E ' : I'r J (3) 

m=l i m SN^ m 

M 

E fjmV m < Sj, Vj (4) 
m=l 

0</ jm <l, Vj,m (5) 
c G C (6) 

Constraints (|2]) are such that each helper's total upload to neighboring user i m who is viewing video m cannot 
exceed its available service rate for video m. Constraints (|3|), (j4]) and ([5]) are feasibility constraints on bandwidth 
and storage. © are combinatorial, representing the bounded-neighbor helper-user topology constraints. The above 
problem is a joint storage, bandwidth, and helper-user topology optimization problem, and is challenging to solve 
even in a centralized manner due to its combinatorial nature. In the next section, we design distributed algorithms 
and prove its convergence to a near-optimal solution. 

IV. Distributed Solution 

Constraints ©^dS]) are independent of constraint ©, which allows us to decompose it into a resource allocation 
problem and a topology building problem and solve them in tandem. We present our solutions to each problem in 
the following subsections. 



A. Storage and Bandwidth Allocation 

Fixing topology c, and assigning Lagrangian variables kji m to constraints (J2|), Xj to constraints (j3j, and fj,j to 
constraints (j4]), we obtain the following partial Lagrangian: 



min max > min( N 
k,X,uO<f<l,x z — ' ^-^ 

m,i,„ i£ N i m 



(V) 



J M 



whose optimal solution is denoted by U{c). For simplicity, we abbreviated ^ ^ as / and ^ ^ ^ 

m=l i m =l m,i m j'=l m=li m £N= m 

as . By rearranging the terms, the above problem can be solved successively in the primal and dual variables. 



Specifically, given k, A and fi, we have the subproblem in x and f: 

m ^ X ^2 ( min ( X! X jim^ r m) ~ ^2 (Aj + k jim )x jim ) 

+ max VV m V" k jim - fJ,jV m )fjm (8) 

j,m veN^ 

The unique structure of the partial Lagrangian allows us to use a simple primal-dual algorithm [8] as its solution, 
which we state in the following theorem. 

Theorem 1. The following resource allocation algorithm converges to the optimal solution to problem (f7]): 

(\j + k jlm )) [ °> +oo) , Vj,m,i m em 



« [9x Jtn 



A j = 7(Em=l E 



Aj = <*(Em=i /iim^m - 5'i)£ +oo) , Vj 



s,)i ; +oo) , Vj 



(9) 



k 



e(x 



J In 



)h' J m ' € N 



where h 



a,b] 



min(0, h), y > b; 

h, a < y < b; 9x jim is the partial derivative of the function g with respect to Xji 

max(0, h), y < a, 



and g = min(^ JgN5 Xji m ,r m ), and a,/3,y,5,e are small step sizes. 

Proof: At optimal, the following KKT conditions of the Lagrangian in (O should also hold: 



[0,+oo) 



7 * 7 * \ L u > 



(Ei m eN 
^*(Em=lE 



0.1] 











M|(E^=i/;^m-5 J ) = 

>■ ^ji m ( X ji m ~ fjm r rn) = 

where x* and f* are the primal optimal and A*,/i* and k* are the dual optimal. Strictly speaking, g is not 
differentiable at EjeN c = r «2' where we simply let g X]im = in practice without affecting the performance. 

Denote by y = (x, f, A, /x,k) and by y* = (x*, f*, A*, p,*, k*). To prove that y — > y*, we propose the following 



generalized energy function: 

V(y) = — ||x-x*|| 2 + 47l|f-f*H 2 + — l|A-A*|| 2 
\jj 2a \\ n ii ii 2-7 11 11 

+ jL||k-k*H 2 + —11/7 - U*\\ 2 

and show that (a) V(y) > Vy ^ y* and V(y*) = 0; (b) V(y) < Vy and V(y*) = 0. 
(a) is obvious since V(y) is summation of quadratic terms. To show (b), we derive V"(y): 



v(y) = E(^-^)^ m -(Ai + %J)gi oo) 



[0,+oo) 



+ - /4)(E fa-*** - ^)£ +oc) 

by applying partial derivatives and plugging in the dynamic system equations. For simplicity, we have omitted the 
sets over which the terms are summed up. It is easy to see that V(y*) = 0. Now we can upper-bound V"(y) as 
follows: 

V(y) < J2(x jim - x* im ){g Xjim - (Xj + k jim )) 

+ ^ ^ (.kji m ^ji m )(. x ji m fjmTm) 
+ E^J ~ ^)(E him V ™ ~ Sj) 

= E ~ ^ ) ~ ) 

+ E( X ^ - x kJK+„ - ( A i + *&J) 

^ l i^jim ~ ^ji m ^( X ji m ~ fjm r m) 

E^™ - x jiJ(9x JZm ~ 9x 



< 



+0+0+0+0+0 <0 



where the first inequality is obtained by dropping the y terms, and the second inequality is obtained by applying 
the set of KKT conditions. The last set inequality holds due to the concavity of the function min Xji m , r m ) 



over Xji n 



Using (a) and (b), it follows from Krasovskii-LaSalle principle EOl that y converges to the set S = {y| V"(y) = 0}. 
It remains to show that the S contains no trajectories other than {y = y*}. Due to the space constraint, we omit 
that part of the proof in this paper. Interested readers can refer to JH for details. ■ 

We make the following remarks: 

• The resource allocation algorithm has intuitive economic explanations. Specifically, one can view kji m and Xj 
as the video availability prices and bandwidth prices which are induced by helper fs storage and bandwidth 
constraints and which helper j charges user i m . One can also view pj as storage prices that helper j has 



to pay. Indeed, the larger the video availability and bandwidth prices are, the smaller Xji m (that user i m 
requests from helper j) is. The larger/smaller the video availability price/storage price is, the larger fj rn (that 
helper j increases for video m) is. Similarly, the values of the prices are also driven by the relative difference 
between the given demand and the available resources. For example, the increase in video availability price 
kji m is proportional to the difference between the demand Xji m and the available rate fj m r m . This economic 
framework can be potentially extended to building incentive mechanisms into the system. 
• It is also not hard to see that the primal variables x and f will converge to the following intuitive solutions. 
In problem ([8]), every user i m will choose to request Xji m with the smallest combined prices (Xj + kji m ) until 
it reaches the maximum possible value. If the summation of received rates has not reached r m , it will choose 
to request Xji m with the second smallest combined prices. It will continue to do so until the summation of the 
received rates reaches r m . Similarly, the solution for fj m ,m = 1,2, ... ,M can be obtained by water-filling 
helper j's storage Sj in descending order of the combined prices (r m ^ gfr kji m — fijV m ), which matches 
with helper j's goal to maximize its "profit". 

B. Markov Approximation of Overlay Optimization 

Recall that U(c) is the optimal solution to problem (0. It is then left to solve: 

max [/(c) s.t. cGC (10) 

c 

However, the set of possible overlay configurations given peers neighborhood constraints is exponential in the 
number of nodes, which make the problem NP hard even in a centralized manner. To overcome this difficulty, we 
re-write it as follows: 



max 

p 



J2PcU(c) (11) 

cec 

s.t. J^Pc = 1 and < Pc < 1, Vc G C 

cec 

One can see that the problems (fTTT) and ( fTOl are equivalent: the optimal solution to problem (fTTT ) is obtained by 
setting p c * = 1 for c* = argmax c , eC U{c') and p c = for all other c € C. Relaxing the objective ^2 cG cPc^( c ) 
by adding a weighted entropy term ^H(p), where k > and H(p) = — ^ c eC Pc 1°§ Pc, we have the following 
theorem shown by Chen et al. 0: 



Theorem 2. The optimal solution to: 



y^p c U(c)--'S^p c logp c (12) 

cec cec 

s.t. ^^Pc = 1 and < p c < 1, Vc G C 



cec 



is given by: 



LceC ex P( K[/ ( c )) 



Proof: The Lagrangian of the problem (TT2J) is given by: 

L(p c ,u c ,/j) = ^2p c ll(c) -- ^Pc log p c 

cec cec 

+^2 u cp c + m 1 - ^2 Pc), 

cec cec 

where v c and p, are the Lagrangian variables. At optimal, the following KKT conditions [?] should hold: 

U(c) - ~(\ogp* c + 1) + v* — /x* = 0, Vc G C, 



where p* is the primal optimal, and u* and n* are the dual optimal. Writing p* as a function of u* and fi* and 
applying the constraint J2 c£C p* = 1, we obtain: 

//* = i log (X>p(k(^(c) +<)-!)). 

Plugging /x* back into p*, we get: 

* 

Pc 



cec 



exp (k(E/(c) + u*) - 1) 



exp (re/i*) 
exp (kC/ (c)) 

E ce c ex P( K[/ ( c )) 



Vc e C. 



Note that the optimal solution p* is in a product-form, and thus is the stationary distribution of some time- 
reversible Markov Chain (MC), hence the term MC approximation. Note that as k — > +00, p* c , — > 1 and therefore 
the optimal solution of the relaxed problem (fT2l ) approaches to that of the original problem ( fTOl ). It is also easy to 
see that for a fixed k, the error term -H(p) is bounded by - log [C|. 

Our motivation behind this approximation is that it can potentially lead to distributed solutions. In this case, one 
can construct a MC with the overlay topology configurations as its states, and carefully design transition rates g C)C / 
such that the overall system will probabilistically jump between topology configurations while staying in the best 
configuration, i.e., c* for most of the time, and that the system performance will approach to the optimal. One 
straightforward design of q cc i is given by: 



1c,d 



c,d satisfy S; 



ex P («(c/(c))) ' 5ail51 > (14) 
otherwise. 



where r > is a constant, U(c) is the overall system utility under state c, and 5 is the following set of conditions: 

• 3c s.t. c C c, c C c', |c \ c| = \d \ c\ = 1; 

• Link c \ c and link c' \c originates from the same peer. 

In other words, only the following state transitions c d are valid: a single peer first drops a single connection to 
one of his neighbors (from c — > c) and then randomly adds a new single connection from his neighborhood (from 
c — > d). c is an auxiliary state and can be viewed as the intermediate state in which a single link from a single 
peer is dropped from c, where c \ c represents the dropped link. It is not hard to see that g c c < satisfies the detailed 
balanced equations q c , C 'Pc = Qc',cP*>> tnus the stationary distribution p* in equation ([T3T ) can be achieved. We refer 
to this as the "uniform-neighbor-choking" algorithm because peers uniformly randomly choke neighbors in periods 
that depend on U{c). 

However, one caveat of the above design is that a peer still needs to know the global information U (c) that needs 
to be broadcast to all the peers from time to time. This burdens the system with overhead. It is desirable to have 
a distributed algorithm in which each peer needs only local information to perform such update and still achieve 
global optimality. 

C. Soft-Worst-Neighbor-Choking Algorithm 

Motivated by the above discussions, we propose the "soft-worst-neighbor-choking" algorithm by designing: 

(15) 



exp (kx c \ c ) 

where x c \ c is the rate on the dropped link c \ c, and c, c, d should satisfy S. Here, the transition rates depend on 
only local information of link rates of peers' active neighbors. 

We now give the overall distributed algorithm. For simplicity, the algorithm is stated under the perspective of 
user i m , and those at other users/helpers are similar and self-explanatory. 
Topology building - "soft-worst-neighbor-choking" 



• Initialization: User i m randomly chooses and connects to Nf^ ax neighboring helpers from his neighborhood 
Nj m and does the following steps. 

• Step 1: User i m independently draws an exponentially distributed random variable with mean 
-ttm — i — ,, m „\ ^ 7 7 and counts down to zero. 

• Step 2: After the count-down expires, user i m drops neighbor j with probability ^ ^g^z^ - ) and 

randomly chooses and connects to a new neighbor from the set Nj m \ . It then repeats Step 1. 
We make the following remarks: 

• It is easy to see that the algorithm gives q CtC ' as in (fT5T ). 

• The algorithm is fully distributed, i.e., each peer runs the algorithm independently. Compared to the "uniform- 
neighbor-choking" algorithm, peers only need to know local information of the link rates of their one-hop 
neighbors. The algorithm is intuitive: the larger the link rate, the less likely it is dropped and vice versa. 

• It is worth noting that BitTorrent ||2T1 uses a "worst-neighbor-choking" algorithm, where each peer periodically 
chokes the link with the worst rate. In our case, link rates are weighted exponentially. The worst link is 
choked with the highest probability (which approaches 1 as k — > +oo) while other links can also be choked 
occasionally, hence the term "soft-worst-neighbor-choking" algorithm. This algorithm is also generalizable to 
other P2P systems. 

D. Performance Analysis of Soft-Worst-Neighbor-Choking 

We now state the mathematical underpinnings behind such design of the algorithm and analyze its performance. 
It is interesting to see that the rate x c \ c on link c \ c can be viewed as an approximation to U(c) — U(c), which 
is the overall system performance difference before and after link c \ c is dropped. If x c,c = U(c) — U(c), i.e., the 
helper cannot re-utilize the upload rate onc\c after it is dropped, then we have: 

r 



exp(K(U(c)-U(c))) 

where c, c, d satisfy S. In this case, q CjC > still satisfies q c , C 'Pt = Qc',cPc> and the stationary distribution is no different 
from that of the uniform-neighbor-choking algorithm in ( fT3l ). However, U(c) — U{c) < x c,c in general, because 
the rate x c,c on the dropped link maybe fully or partially re-utilized by the helper for his other neighbors. In the 
following, we show that under some minor assumptions, one can still achieve a stationary distribution in product 
form similar to that in (fT3l) . 

Denote by uj c = x c ' c — (U(c) — U(c)) the error term resulted by approximating (U(c) — U(c)) with x c \ c . 
Depending on overlay c and the actual converged values of the storage and rate allocation algorithm, oo c may 
take values anywhere in between and B m3iX , where B mekX is the maximum over all helpers' upload capacity. We 



quantize such error uj c into n c + 1 values [0, ^ ajt , „ max , . . . , B m£K ], and assume that uj c = ^ max with probability 



. . . , B max ], and assume that oj c = ^| 
p Ch ,k = 0,1, ... ,n c and Y^k=Q Pc k = 1- Under these assumptions, we show the following theorem: 

Theorem 3. The stationary distribution p c of MC with transition rates M5\) is given by: 

lie „ „, rvA ( „ kB, 



where a c = YJk=o Pc k exp [k 

Proof: Consider a modified MC as follows: expand each state c of the original MC to n c + 1 states c^, k 
0, 1, . . . , n c with the following transition rates: 

TPc' 



exp [k[ U{c) - U{c) + 



It, 



(17) 



where p c ' y , k = 0, 1, . . . , n c >, is the probability measure on expanded states and Ylk''=o Pc' > = Note that cq refers 
to state c with zero error. Using equation (fT71 ) and detailed balance equations p Ck Q Ck ,c' , = Pc',Qc',,c k < we have 



Vc ,c' fc ,: 

PCQ _ Vdy 

Pco exp (kU(c)) pdi gxp (k ( U(c') + £2* 



const 



Usin g E ceC Efc=o^ fc = 1, we obtain: 



Pc k 



Denote by a c = Efc=o P°k ex P ^K fc ^ max ^ and we have: 

<r c exp(/d7(c)) 

■ 

We make the following remarks: 

• If n c = const Vc and the distribution p Ck , k = 0, 1, . . . , n c is the same Vc, c c = const. In this case, p c = p*. 

• The total variational distance dxv(p*,p) can be upper bounded by (1 — exp (— KB max )). This is because 

P*c~Pc=P*c 1 - Se>eCge) fx C p( K(C /(cO)) • Since a c G [1, exp (KBmax)]. the fractional can be lower bounded by 

\ E c / eC cxp(*(t/(c'») / 

exp (— nB msx ) and hence the result. 

• In general, it is difficult to give a tight analytical lower bound on the performance EcgC U{c)p c because a c 
is unknown. However, note that when k — > +oo, p c > — > 1 where c* = argmax c , gC U(d), and the system 
approaches the optimal U(c*). We will show in our simulations that the algorithm performs well and is close 
to the optimal. 

E. Discussions 

Our overall scheme is fully distributed, i.e., each peer runs the algorithm independently and makes changes based 
on only one-hop local information. The user passes the derivative of their utility function to helpers to perform 
distributed resource allocation; users and helpers periodically choke their neighbors based the relative performance 
of their one-hop links. 

The deployment of such a practical system is easy: newly deployed helper nodes can automatically connect to a 
set of interested users and load balance their storage and bandwidth resources using the distributed algorithms. This 
simple solution helps achieve minimum maintenance overhead and can easily adapt to system dynamics. When 
system's supply and demand pattern changes due to helper/user joining/leaving the system and video popularity 
shifting, the helper nodes will automatically update their content caching, allocate their bandwidth resource and 
dynamically change their neighborhood selections in a distributed and local manner, which will best match to 
the global system demand and available resources and optimize overall system utility. We summarize the simple 
algorithms at both classes of peer nodes in the next section. 

V. System Implementation 

A. Back-up Server and Tracker 

A centralized server with all the video content is present that acts as the "life-line" to supplement the deficit (if 
any) in the system. A tracker is used to keep track of all the participating peers and to assist building an overlay 
network. When a user/helper joins the system, it obtains from the tracker the IP addresses of a list of helpers/users 
in their neighborhood. It then connects to its maximum allowed number of neighbors randomly chosen from their 
neighborhood. 



Algorithm 1 User Protocol 



l: Initialization: Set ti,t 2 = 0, draw t 3 ~ Exp fr(|Nj m | — N™ x ) Ysjem ex P (~ KX ji m ))' an d iterate: 
2: if mod(ii,T im ) = then 

3: Count the number of received packets from each neighboring helper over the period [t\ — Ti m ,t\ — 1], and 
update the corresponding average rate Xji m . Derive the derivative g x .. of its utility function and sends them 
to all j € W im . 

4: end if 

5: if mod(t 2 ,BUFFER_TIME) = then 

6: Download from the server all the missing packets in the next BUFFER_TIME worth of segments. 

7: end if 

8: if t 3 = then 

9: Drop neighbor helper j with probability ^ ° xp ixp^-^i — ]■ Randomly choose and connect to a 
new neighbor from the remaining neighborhood to replace j, and set Xji m = 0. Draw t 3 ~ 



10: end if 

11: t 1 <-t 1 + l,t2<-t 2 + l,t3<-t 3 -l. 



B. Packet Exchange Protocol 

The video packets each helper stores are coded using a rateless code and downloaded from the server. To the 
validate the proposed algorithms under asynchronous scenarios, we let each helper j maintain its own clock. Helper 
j also updates its bandwidth/storage allocation algorithm only in periods of Tj seconds and keeps an outgoing buffer 
worth of Tj seconds of its upload bandwidth capacity. Users maintain a buffer length denoted by BUFFER_TIME 
that covers an integer number of video segments. For a particular user, as it decodes the packets and plays the 
video in the segment right ahead of its playback time, it also receives packets of the next unfulfilled segment from 
the helpers. Upon finishing BUFFER_TIME worth of segments, the user will immediately fetch the packets from 
the server to fill any missing packets in the next BUFFER_TIME range. Each user also has its own clock and 
an bandwidth request update period Tj m . The detailed packet exchange protocols for both users and helpers are 
described in Algorithm 1 and 2 based on the theoretical analysis given in section (IIVI) . 



It is worth noting that our proposed analysis relies on a few assumptions made to make mathematical arguments 
simple. First, we have assumed in the algorithms that peers have synchronized clocks. This can be difficult to 
maintain in practice. Second, when solving problem ( flOl ), we have assumed that the underlying resource allocation 
algorithm has fully converged and ignored the different time scales. Although there exist a number of techniques 
that can address these issues 0, (H, they are not the focus of our paper and we omit the heavy discussions involved 
in the analysis. Instead, we validate the feasibility and effectiveness of the proposed algorithms by designing our 
simulations that capture real-world scenarios including the effects of asynchronized clocks among peers and random 
network delays. We also show in the simulations that the distributed scheme adapts well to system fluctuations 
including change in video demand patterns and peer dynamics. 

A. Experimental Setup 

We set total number of videos M = 4, helpers J = 70 and users Ylm=l ^ m = ^0- Table UII shows each video's 
streaming rate and the fraction of users watching it. Helpers have upload and storage capacities with different 
distributions shown in Tables [III] and [TV] Each peer can potentially connect to every other peer in the system, but 
has a maximum allowed number of neighbors uniformly randomly chosen from [3, 10]. This setup is based on 
practical data in commercialized P2P systems CD-El, which makes it easy to test the robustness of the proposed 
algorithms in highly heterogenous scenarios. We also set the step sizes a = l,/3 = 0.01,7 = 5 = 0.5, e = 0.05 for 




VI. Simulation Results 



Algorithm 2 Helper Protocol 



l: Initialization: Set xji m = 0, fj m = 0, Xj = 0, = and kji m = 0. Set t\ = 0, draw 

t 2 ~ Exp (r(\Nj\ - A^ max ) Ei m6N j ra ex P ("^j), and iterate: 



9 
10 
1 1 
12 
13 



if mod(ti,T j ) = then 



fjm 4 fjm + /^(Xyi,_GN m ^ji™ ^mAfjOf- 

3 * 3 



[0,1] 



5: Aj <- Aj + 7 (E^ =1 E ira6N < „ " ^)S° H 



<M f T7 " q \[0,+oo) 



Hj <- Hj + <y(Em=l /jim^m - ^OkT 

^ "I" ^i^jim fj m ^' m ) kji 



Allocate number of packets equivalent to Xji m Tj to user i m for all i rn and put them in the outgoing buffer. 
Re-allocate its storage of videos according to fji m . 
end if 

Send remaining packets in the buffer to neighbor users, 
if ti = then 

Drop neighbor user i m with probability ^ — cx P(- hX y™.) Randomly choose and connect to a 

new neighbor from the remaining neighborhood to replace i m , and set Xj im = 0. Draw i 2 ~ 

Exp (r{\Nj\ - A7 ax ) E im£ N|, m ex P {-iwji* 
14: end if 

15: ti <r- h + l,t 2 <-t 2 -l. 



the bandwidth and storage allocation algorithm, and set k = 10, r = 0.01 in the topology update algorithm. These 
parameters are chosen to guarantee smooth algorithm updates and small MC approximation errors. 

B. Convergence in the Static Case 

We first test the convergence of the storage and bandwidth allocation algorithm in the static case, where all 
peers stay in the system during the entire simulation time and perform no topology update. We first focus on the 
synchronous case, where peers share a synchronous clock and have an update period of 1 second. Figure [TJa) shows 
the instantaneous server load versus simulation time. Also shown as for comparison is the system's intrinsic deficit, 
i.e., total users' streaming rate demand minus total helpers' upload bandwidth. The initial server load is high, but 
it quickly drops to a stable point. The sub-figures (b) and (c) in Figure Q] show the convergence of a particular 
helper's (ID = 1) upload rate and storage allocation. The convergence results for the shadow prices are similar, 
which we omit here due to limit in space. 



Table II 

Video streaming rate distribution 



Streaming rate (kbps) 
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896 


896 


1152 


Fraction (%) 


10 


20 


50 


20 



Table III 

Helper upload capacity distribution 



Upload (kbps) 


256 


384 


512 


640 


768 


896 


1024 


Fraction (%) 


5 


10 


15 


40 


15 


10 


5 



Table IV 

Helper storage capacity distribution 



Storage (MB) 


768 


960 


1152 


1344 


1536 


1728 


1920 


Fraction (%) 
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10 
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Figure 1. Convergence results of storage and bandwidth allocation algorithm in a static and synchronous setting, where no overlay topology 
update nor peer dynamics is present. Peers have synchronized update periods of 1 seconds, (a), (b) and (c) show the required server load, 
bandwidth and storage allocation for the helper (ID = 1). 



C. Asynchrony and Random Network Delay 

To test the robustness of the bandwidth and storage allocation algorithm in real networks, we add asynchrony 
and random network delay in the system. Specifically, peers have asynchronous clocks and choose update periods 
uniformly randomly from the set of {1,3,5,7,9} seconds. In addition, every peer has a communication delay to 
every one of its neighbors randomly chosen from 1 to 5 seconds. These numbers are chosen to stress test the 
system. Figure [2] shows the server load, bandwidth and storage allocation for the same helper (ID = 1). Compared 
to Figure CD the bandwidth allocations experiences more fluctuations, but they still center around comparable 
average values. The server load and helpers' storage load are quite stable, which demonstrates the robustness of 
the algorithm. In the following sections, our simulation experiments will apply the same asynchrony and random 
network delays unless mentioned otherwise. 



1.6 



x 10 



1.4 



1.2 



oiO.8 

W 

0.6 




■ Instantaneous server load 
■System's intrinsic deficit 



200 400 600 800 1C 
Simulation time (seconds) 

(a) Server load, async. 



upload rate to neighbor 
-upload rate to neighbor 2 
-upload rate to neighbor 3 
-upload rate to neighbor 4 




200 400 

Simulation time (seconds) 

(b) Bandwidth alloc, async. 



0.2 



= 0.15 



5 0.05 



■■Video 1 
-Video 2 
-Video 3 
Video 4 




200 400 600 800 
Simulation time (seconds) 

(c) Storage alloc, async. 



Figure 2. Convergence of storage and bandwidth allocation algorithm in the case of asynchrony and random network delays, with no 
overlay topology update nor peer dynamics. Peers have asynchronous clocks and random update periods uniformly drawn from {1, 3, 5, 7, 9} 
seconds. Each peer also has a communication delay to his neighbors randomly chosen from [1,5]. (a), (b) and (c) show the required server 
load, bandwidth and storage allocation for the helper (ID= 1). 



D. Effectiveness of the Overlay Topology Update 

As is evident from Figure [TJ the server load cannot reach to the minimum value of the intrinsic system deficit 
without topology update. This is because some helpers have poor performance to their connected neighbors and 
have not fully utilized their upload bandwidth. We run the distributed overlay topology update algorithm on top of 
the resource allocation algorithm with other parameters and configurations unchanged. Figure 0a)(b) shows server 
load versus simulation time without and with overlay topology update respectively, where Figure 0a) is simply 
Figure (2^a) shown again for comparison. It can be seen that overlay topology update buys approximately 14% 
reduction in server load and eventually achieves the intrinsic system deficit which is the theoretical lower bound 
of server load. 
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Figure 3. Server load versus simulation time. Peers have asynchronous update periods and random network delays, (a) Server load without 
overlay topology update; (b) server load with overlay topology update. 



E. Effects of Dynamics 

We show in this section that our system is "plug-and-play", i.e., which requires minimum maintenance and which 
is automatically adaptive to system dynamics. Peers will only need to run their distributed algorithms regardless of 
system fluctuations and be able to keep updated to the supply and demand patterns across multiple channels. 

We first examine the effects of peer dynamics. To do this, we add new users and new helpers that join the 
system following a Poisson process with mean 20. The newly joined peers will follow the demand and resource 
distributions listed in Tables ITT1 UTTl and ITVl In addition, every peer will stay in the system for an exponential random 
amount of time with average of 200 seconds. To examine how fast the system responds to dynamics, we simulate 
till 1000 seconds but stop the dynamic process at the 600th second. Figure [4] (a)(b) show how the server load varies 
with time, without and with overlay topology updates respectively. The available system resources also change due 
to dynamics, as is evident from the varying intrinsic system deficit shown in the figures. It is demonstrated that 
the algorithm can keep updated to the dynamics. When overlay topology update is present, the system can also 
approach the minimum server load. Note that the instantaneous intrinsic system deficit stops at a different value 
in two cases, only due to the difference in the pseudo-randomness generated by the computer with and without 
the topology update. The results have demonstrated the robustness of the resource allocation and topology update 
algorithms to system dynamics. 

We also use a simple example to illustrate how the system responds to changes in video demand patterns. In 
particular, we pick a helper (ID = 4) who has 10 neighbor users with 7 users watching video 3 and 3 users watching 
video 4. At t = 300, we let all the users in video 3 "switch channels" with half of them switching to video 4 
and the other half switching to video 2. Figure [5] shows how the helper (ID = 4) responds to such change by 
re-allocating its storage resources. Both Figure [5] and Figure [TJc) demonstrate the helper's "plug-and-play" feature, 
i.e., helpers can automatically load balance its resources given system demand patterns. 
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Figure 4. Effects of system dynamics on server load. A new user and a new helper will join the system every 20 seconds on average. 
Each peer stays for an average of 200 seconds in the system, (a) Server load without overlay topology update; (b) server load with overlay 
topology update. 
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Figure 5. Effects of changes in video demand patterns. Users in video session 3 "channel switch" to video 2 and 4 with equal numbers. 



VII. Conclusions 

In this paper, we propose to minimize the server load in a helper-assisted multi-channel P2P VoD system. 
Helpers who help provide the VoD service are limited in bandwidth and storage, and each helper and user has a 
constraint on the maximum number of neighbors that they can connect to. This problem is critical for exploring 
the maximum potential of practical distributed P2P VoD systems. The mix-convex-combinatorial nature of the 
problem under practical constraints makes it challenging to solve even in a centralized manner. We tackle this 
challenge by designing two distributed algorithms running in tandem: a primal-dual resource allocation algorithm 
and a "soft-worst-neighbor-choking" topology building algorithm. The overall scheme is simple to implement and 
provably converges to a near-optimal solution. Simulation results show that our proposed algorithm minimizes 
the server load, and is robustness to asynchronous clock times, random network delay, video popularity changes 
and peers dynamics. Our proposed system design and algorithm provide useful insight to practical video content 
distribution applications. Possible future work includes: (1) design incentive mechanisms into the system; and (2) 
build a practical system prototype. 
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